Improving kNN Text Categorization by Removing Outliers from Training Set

نویسندگان

Kwangcheol Shin

Ajith Abraham

Sang-Yong Han

چکیده

We show that excluding outliers from the training data significantly improves kNN classifier, which in this case performs about 10% better than the best know method—Centroid-based classifier. Outliers are the elements whose similarity to the centroid of the corresponding category is below a threshold.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Svm Based Improvement in Knn for Text Categorization

ABSTRACTIn today‟s library science, information and computer science, online text classification or text categorization is a huge complication. [1]With the enormous growth of online information and data, text categorization has become one of the crucial techniques for handling and standardizing text data. Various learning algorithms have been applied on text for categorization. On the basis of ...

متن کامل

Using cellular automata for improving knn based spam filtering

As rapid growth over the Internet nowadays, electronic mail (e-mails) has become a popular communication tool. However, junk mail also, known as spam has increasingly become a part of life for users as well as internet service providers. To address this problem, many solutions have been proposed in the last decade. Currently, content-based anti-spam filtering methods are an important issue; the...

متن کامل

A ME Model Based on Feature Template for Chinese Text Categorization

With entering into information society and the Internet developing rapidly, people could acquire more and more information. How to utilize Internet information efficiently and promptly, has became a hotspot in information technology. Text categorization is an important component to help getting useful message from tremendous amount of vast information. And it assigns new documents to pre-define...

متن کامل

ML-KNN: A lazy learning approach to multi-label learning

Multi-label learning originated from the investigation of text categorization problem, where each document may belong to several predefined topics simultaneously. In multi-label learning, the training set is composed of instances each associated with a set of labels, and the task is to predict the label sets of unseen instances through analyzing training instances with known label sets. In this...

متن کامل

An effective refinement strategy for KNN text classifier

Due to the exponential growth of documents on the Internet and the emergent need to organize them, the automated categorization of documents into predefined labels has received an ever-increased attention in the recent years. A wide range of supervised learning algorithms has been introduced to deal with text classification. Among all these classifiers, K-Nearest Neighbors (KNN) is a widely use...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2006

Improving kNN Text Categorization by Removing Outliers from Training Set

نویسندگان

چکیده

منابع مشابه

Svm Based Improvement in Knn for Text Categorization

Using cellular automata for improving knn based spam filtering

A ME Model Based on Feature Template for Chinese Text Categorization

ML-KNN: A lazy learning approach to multi-label learning

An effective refinement strategy for KNN text classifier

عنوان ژورنال:

اشتراک گذاری